56 research outputs found

    Curriculum Learning for Handwritten Text Line Recognition

    Full text link
    Recurrent Neural Networks (RNN) have recently achieved the best performance in off-line Handwriting Text Recognition. At the same time, learning RNN by gradient descent leads to slow convergence, and training times are particularly long when the training database consists of full lines of text. In this paper, we propose an easy way to accelerate stochastic gradient descent in this set-up, and in the general context of learning to recognize sequences. The principle is called Curriculum Learning, or shaping. The idea is to first learn to recognize short sequences before training on all available training sequences. Experiments on three different handwritten text databases (Rimes, IAM, OpenHaRT) show that a simple implementation of this strategy can significantly speed up the training of RNN for Text Recognition, and even significantly improve performance in some cases

    Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

    Full text link
    In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods

    Key-value information extraction from full handwritten pages

    Full text link
    We propose a Transformer-based approach for information extraction from digitized handwritten documents. Our approach combines, in a single model, the different steps that were so far performed by separate models: feature extraction, handwriting recognition and named entity recognition. We compare this integrated approach with traditional two-stage methods that perform handwriting recognition before named entity recognition, and present results at different levels: line, paragraph, and page. Our experiments show that attention-based models are especially interesting when applied on full pages, as they do not require any prior segmentation step. Finally, we show that they are able to learn from key-value annotations: a list of important words with their corresponding named entities. We compare our models to state-of-the-art methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform previous performances on all three datasets

    IA au service de l\u27indexation des contenus en bibliothèque (L\u27)

    Get PDF
    Diaporama de l\u27intervention de Christopher Kermorvant dans le cadre de la Biennale du numérique 2023 " Intelligence artificielle : écosystèmes, enjeux, usages

    SIMARA: a database for key-value information extraction from full pages

    Full text link
    We propose a new database for information extraction from historical handwritten documents. The corpus includes 5,393 finding aids from six different series, dating from the 18th-20th centuries. Finding aids are handwritten documents that contain metadata describing older archives. They are stored in the National Archives of France and are used by archivists to identify and find archival documents. Each document is annotated at page-level, and contains seven fields to retrieve. The localization of each field is not available in such a way that this dataset encourages research on segmentation-free systems for information extraction. We propose a model based on the Transformer architecture trained for end-to-end information extraction and provide three sets for training, validation and testing, to ensure fair comparison with future works. The database is freely accessible at https://zenodo.org/record/7868059

    Large-scale genealogical information extraction from handwritten Quebec parish records

    Get PDF
    This paper presents a complete workflow designed for extracting information from Quebec handwritten parish registers. The acts in these documents contain individual and family information highly valuable for genetic, demographic and social studies of the Quebec population. From an image of parish records, our workflow is able to identify the acts and extract personal information. The workflow is divided into successive steps: page classification, text line detection, handwritten text recognition, named entity recognition and act detection and classification. For all these steps, different machine learning models are compared. Once the information is extracted, validation rules designed by experts are then applied to standardize the extracted information and ensure its consistency with the type of act (birth, marriage and death). This validation step is able to reject records that are considered invalid or merged. The full workflow has been used to process over two million pages of Quebec parish registers from the 19–20th centuries. On a sample comprising 65% of registers, 3.2 million acts were recognized. Verification of the birth and death acts from this sample shows that 74% of them are considered complete and valid. These records will be integrated into the BALSAC database and linked together to recreate family and genealogical relations at large scale

    Landscape Analysis for the Specimen Data Refinery

    Get PDF
    This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens

    A Comparison of Noise Reduction Techniques for Robust Speech Recognition

    No full text
    . This report presents the integration of several noise reduction methods into the frontend for speech recognition developed at IDIAP. The chosen methods are : Spectral Subtraction, Cepstral Mean Subtraction and Blind Equalization. These dierent methods are studied from a theoretical point of view. Their implementation is described and they are tested on the Numbers95 speech database. A good noise robustness is obtained by combining two of these methods, like Spectral Subtraction with Cepstral Mean Subtraction or Spectral Subtraction with Blind Equalization. The later combination is found to be more appropriate for real recognition systems since it is frame synchronous. A comparison with Jah-RASTA-PLP is also given. Acknowledgements: The support of the OFES under the grant for the \Speech, Hearing and Recognition" (SPHEAR) project # OFES 970299 is gratefully acknowledged. The work described in this report beneted from fruitful discussions with Chac Mokbel. IDIAP{RR 99-10 1 Content..
    • …
    corecore